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ABSTRACT:   This  paper  develops  a  general  approach  to  robust,  regression-based 
specification  tests  for  (possibly)  dynamic  econometric  models.   The  key 
feature  of  the  proposed  tests  is  that,  in  addition  to  estimation  under  the 
null  hypothesis,  computation  requires  only  a  matrix  linear  least  squares 
regression  and  then  an  ordinary  least  squares  regression  similar  to  those 
employed  in  popular  nonrobust  tests.   For  the  leading  cases  of  conditional 
mean  and/or  conditional  variance  tests,  the  proposed  statistics  are  robust  to 
departures  from  distributional  assumptions  that  are  not  being  tested. 
Moreover,  the  statistics  can  be  computed  using  any  vT-consistent  estimator, 
resulting  in  significant  simplifications  in  some  otherwise  difficult 
--contexts.   Among  the  examples  covered  are  conditional  mean  tests  for  models 
estimated  by  weighted  nonlinear  least  squares  under  misspecif ication  of  the 
conditional  variance,  tests  of  jointly  parameterized  conditional  means  and 
variances  estimated  by  quasi-maximum  likelihood  under  nonnormality ,  and  some 
new,  computationally  simple  specification  tests  for  the  tobit  model. 


1 .  Introduction 

Specification  testing  has  become  an  integral  part  of  the  econometric 
model  building  process.   The  literature  is  extensive,  and  model  diagnostics 
are  available  for  most  procedures  used  by  applied  econometricians .   The  most 
popular  specification  tests  are  those  that  can  be  computed  via  ordinary  least 
squares  regressions.   Examples  are  the  Lagrange  Multiplier  (LM)  test  for 
nested  hypotheses,  versions  of  Hausman's  [13]  specification  tests,  White's 
[24]  information  matrix  (IM)  test,  and  regression-based  versions  of  various 
nonnested  hypotheses  tests.   In  fact,  Newey  [17],  Tauchen  [21],  and  White 
[26]  have  shown  that  all  of  these  tests  are  asymptotically  equivalent  to  a 
particular  conditional  moment  (CM)  test.   In  a  maximum  likelihood  setting 
with  independent  observations,  Newey  [17]  and  Tauchen  [21]  have  devised  outer 
product- type  auxiliary  regressions  for  computing  CM  tests.   White  [26]  has 
extended  these  methods  to  a  general  dynamic  setting. 

The  simplicity  of  most  popular  regression-based  procedures  currently 
employed,  including  the  Newey-Tauchen-White  (NTW)  procedure,  is  not  without 
cost.   In  many  cases  the  validity  of  these  tests  relies  on  certain  auxiliary 
assumptions  holding  in  addition  to  the  relevant  null  hypothesis.   For 
example,  in  a  nonlinear  regression  framework  where  the  dynamic  regression 
function  is  correctly  specified  under  the  null  hypothesis,  the  usual  LM 
regression-based  statistic  is  invalid  in  the  presence  of  conditional  cr 
unconditional  hetercskedasriciry .   Except  in  special  cases  the  NTW  outer 
product  statistic  is  also  invalid.   Other  examples  include  the  various  tests 
for  heteroskedasticity:   currently  used  regression  forms  require  constancy  of 


the  conditional  fourth  moment  of  the  regression  errors  under  the  null 
hypothesis.   In  addition,  the  Lagrange  Multiplier  and  other  CM  tests  for 
jointly  parameterized  conditional  means  and  variances  are  inappropriate  under 
various  departures  from  normality. 

The  above  situations  are  all  characterized  by  the  same  feature: 
validity  of  the  tests  requires  imposition  of  more  than  just  the  hypotheses  of 
interest  under  H  .   In  addition,  traditional  econometric  testing  procedures 
require  that  the  estimators  used  to  compute  the  statistics  are  efficient  (in 
some  sense)  under  the  null  hypothesis.   It  is  important  to  stress  that  this 
is  not  merely  nitpicking  about  regularity  conditions. 

Due  primarily  to  the  work  of  white  [22,23,24,26],  Domowitz  and  White 
[7],  Hansen  [11],  and  Newey  [17],  there  now  exist  general  methods  of 
computing  robust  statistics.   In  the  context  of  linear  regression  models, 
Pagan  and  Hall  [19]  discuss  how  to  compute  conditional  mean  tests  that  are 
robust  to  heteroskedasticity .   Their  discussion  centers  around  the  use  of  the 
White  [22]  heteroskedasticity-consistent  standard  errors.   For  one  degree  of 
freedom  tests  the  Pagan  and  Hall  suggestion  leads  to  easily  computable  tests, 
and  it  is  certainly  an  alternative  to  the  current  approach  for  regression 
models.   But  computation  of  the  statistics  for  tests  with  more  than  one 
degree  of  freedom  requires  explicit  inversion  of  the  White  covariance  matrix 
estimator;  this  matrix  must  then  be  used  in  a  quadratic  form  to  obtain  the 
neteroskedasticity-robust  Wald  statistic.   The  tests  proposed  here  are  very 
much  in  the  spirit  cf  the  LM  approach:   computation  requires  estimation  of 
the  model  only  under  the  null,  so  that  any  particular  model  can  be  subjected 
to  a  battery  of  robust  specification  tests  without  ever  reestimating  the 


model.   More  importantly,  the  tests  can  be  computed  using  any  standard 
regression  package. 

Although  there  are  some  fairly  general  formulas  available  for  robust  LM 
statistics  (e.g.  Engle  [9],  White  [24,25]),  formulas  for  general  nonlinear 
restrictions  involve  an  analytical  expression  for  the  derivative  of  the 
implicit  constraint  function  and  a  generalized  inverse.   In  specific 
instances  computationally  simple  robust  LM  statistics  are  available.   A 
notable  example  is  the  paper  by  Davidson  and  MacKinnon  [6],  which  develops  a 
regression-based  heteroskedasticity-robust  LM  test  in  a  nonlinear  regression 
model  with  independent  errors  and  unconditional  heteroskedasticity . 

It  is  a  safe  bet  that  the  substantial  analytical  and  computational  work 
required  to  obtain  robust  statistics  is  a  primary  reason  that  they  are  used 
infrequently  in  applied  work.   Evidence  of  this  statement  is  the  growing  use 
of  the  White  [22]  heteroskedasticity-robust  t-statistics ,  which  are  now 
computed  by  many  econometrics  packages.   Only  occasionally  does  one  see  an  LM 
test,  a  Hausman  test,  or  a  nonnested  hypothesis  test  carried  out  in  a  manner 
that  is  robust  to  second  moment  misspecif ication.   This  is  unfortunate  since 
these  tests  are  inconsistent  for  the  alternative  that  the  conditional  mean  is 
correctly  specified  but  the  conditional  variance  is  misspecif ied.   In  other 
words ,  the  standard  forms  of  well  known  tests  can  result  in  inference  with 
the  wrong  asymptotic  size  while  having  no  systematic  power  for  testing  the 
auxiliary  assumptions  tnat  are  imposed  in  addition  to  h_ . 

This  paper  develops  a  unified  approach  to  calculating  robust  statistics 
via  least  squares  regressions  which  I  believe  is  easily  accessible  to  applied 
econometricians .   The  general  method  suggested  here  can  be  viewed  as  an 


extension  of  the  Davidson  and  MacKinnon  [6]  approach.   In  fact,  in  the 
context  of  nonlinear  regression  models,  their  procedure  is  shown  to  be  valid 
for  quite  general  dynamic  models  with  conditional  as  well  as  unconditional 
heteroskedasticity .   In  the  the  same  context  the  approach  here  can  be  viewed 
as  the  Lagrange  Multiplier  version  of  the  robust  Wald  strategy  suggested  by 
Pagan  and  Hall  [19].   This  paper  also  extends  Wooldridge's  [30]  robust, 
regression  based  conditional  mean  and  conditional  variance  tests  in  the 
context  of  quasi-maximum  likelihood  estimation  in  multivariante  linear 
exponential  families.   The  current  framework  is  more  general  because  it 
applies  anytime  a  generalized  residual  function  (defined  in  section  2)  is 
the  basis  for  the  "test. 

For  the  leading  cases  of  conditional  first  and  second  moments,  the 
regression-based  tests  proposed  maintain  only  the  hypotheses  of  interest 
under  the  null,  and  they  are  applicable  to  specification  testing  of  dynamic 
multivariate  models  of  first  and  second  moments  without  imposing  further 
assumptions  on  the  conditional  distribution  (except  regularity  conditions). 
Moreover,  in  classical  situations,  these  tests  are  asymptotically  equivalent 
under  the  null  and  local  alternatives  to  their  traditional  counterparts. 
Robustness  is  obtained  without  sacrificing  asymptotic  efficiency. 

For  some  specification  tests  the  current  approach  does  impose  auxiliary 
assumptions  under  the  null  hypothesis.   This  is  the  price  one  pays  for  the 
regression-based  nature  cf  the  tests.   Still,  in  most  cases  encountered  so 
far,  the  current  framework  imposes  fewer  auxiliary  assumptions  under  the  nui] 
than  popular  nonrobust  tests.   This  does  not  mean  that  robust  tests  are  not 
available  in  such  circumstances,  but  only  that  regression-based  forms  of 


these  tests  are  not  known.   The  goal  here  is  to  provide  a  unified  approach  to 
robust,  repress  ion -based  specification  tests,  and  not  to  robust  tests  in 
general.   Nevertheless,  the  coverage  is  fairly  broad.   A  general  treatment  of 
robust  tests  is  contained  in  White  [26]. 

A  second  aspect  of  the  proposed  statistics  is  that  they  may  be  computed 
using  any  ,/T-consistent  estimator.   The  asymptotic  distribution  of  the  test 
statistic  under  the  null  and  local  alternatives  is  invariant  with  respect  to 
the  asymptotic  distribution  of  the  estimators  used  in  computation;  this  can 
be  viewed  as  another  kind  of  robustness.   Consequently,  the  methodology  leads 
to  some  interesting  new  tests  in  cases  where  the  computational  burden  based 
on  previous  approaches  can  be  prohibitive.   This  is  true  whether  or  not 
robustness  to  violation  of  auxiliary  assumptions  is  an  issue;  in  fact,  the 
procedure  can  be  profitably  applied  to  situations  which  assume  correct 
specification  of  the  entire  conditional  distribution  provided  that  the  test 
statistic  can  be  put  into  the  form  considered  in  section  2.   In  such  cases 
the  proposed  tests  have  properties  similar  to  Neyman's  [18]  C(q)  tests,  but 
they  are  applicable  even  whether  or  not  the  score  of  the  log- likelihood  is 
the  basis  for  test  statistic.   When  restricted  to  LM  tests,  the  new 
statistics  offer  generalized  residual  alternatives  to  outer  product-type  C(a) 
statistics,  provided  of  course  that  the  score  of  the  log- likelihood  can  be 
put  into  the  appropriate  generalized  residual  form. 

Section  2  cf  the  paper  discusses  the  setup  and  the  general  results, 
section  3  illustrates  the  scope  cf  the  methodology  with  several  examples,  and 
section  4  contains  concluding  remarks.   B.egularity  conditions  and  proofs  are 
contained  in  an  annendix. 


2 .  General  Results 

Let  { (y  ,  z  ):  t— 1,2,...)  be  a  sequence  of  observable  random  vectors  with 

y   lxJ ,  z   lxK.   y   is  the  vector  of  endogenous  variables.   Interest  lies  in 

explaining  y   in  terms  of  the  explanatory  variables  z      and  (in  a  time  series 

context)  past  values  of  y  and  z  .   For  time  series  applications,  let  x  = 

(z  , v   ,,z   , y.,,z,)  denote  the  predetermined  variables.   Note  that 

v  t'^t-l'  t-1'    ,J1'  V  v 

current  z   can  be  excluded  from  x   or,  if  there  are  no  "exogenous"  variables, 
one  may  take  x   e  (y    y     . . . ,y  ) .   For  cross  section  applications  set  x 
■  z   and  assume  that  the  observations  are  independently  distributed. 

The  conditional  distribution  of  y   given  x   always  exists  and  is  denoted 

t        t 

D  (-|x  ).   Assume  that  the  researcher  is  interested  in  testing  hypotheses 
about  a  certain  aspect  of  D  ,  for  example  the  conditional  expectation  and/or 
the  conditional  variance.   Note  that,  because  at  time  t  the  conditioning  set 
contains  (  (y^.z   .),...,  (y.,z)  }  or  [y  .  ,yt_2.  •  •  ■  .Y^  ■  the  assumption  is 
that  interest  lies  in  getting  the  dynamics  of  the  relevant  aspects  of  D^ 
correctly  specified.   For  cross  section  applications  this  point  is 
irrelevant. 

For  motivational  purposes  and  to  illustrate  the  notation,  it  is  useful 
to  introduce  a  couple  of  examples.   The  first  example  concerns  specification 
testing  cf  a  conditional  mean.   Suppose  interest  lies  in  testing  hypotheses 
about  the  conditional  expectation  cf  y^_  (taken  to  be  a  scalar  for  simplicity) 
given  x_ .   The  parametric  model  is 

(mc(xt,a):  a  e  A.  t=l,2,...},  (2.1) 

where  A  c  R  ,  and  the  null  hypothesis  is 


HQ:  E(yt|xt)  -  mt(xt,ao),  some  qq  g  A,  t-1 , 2 (2.2) 

A 

If  a_  is  a  VT-consistent  estimator  of  a      under  H„  then  the  residuals  are 
I  o        U 

A  A 

defined  as  u  (y  ,x  ,q  )  ^  y   -  m  (x  ,a  ) .   A  test  of  H   can  be  based  on  the 

sample  covariance 
T 


A       A 


t"1^  vvvVWW  (2-3) 

-     1       A    A 

-  T   I      A'u  (2.4) 

t-1 

where  X    (x  .q.tt)  is  a  lxQ  vector  function  of  "misspecif ication  indicators" 

A  A 

that  can  depend  on  a   and  a  nuisance  parameter  estimator  n  The  standard  LM 

approachleads  to  a  test  based  on  the  (uncentered)  r-squared  from  the 
regression 

A  A         A 

u   on  V  m  ,  A     t-1 T.  (2.5) 

t  Q  t     t 

If  a      is  asymptotically  equivalent  to  the  NLS  estimator  then  under  H.  and 

2  2 

conditional  homoskedasticity ,  TR   is  asymptotically  y^..   Thus,  the  LM 

J  u      J  Q 

approach  effectively  takes  the  null  hypothesis  to  be 

HI:  H.  holds  and  V(v  |x  )  -  a1    for  some  a1   >   0,  t=l , 2 (2.6) 

U   (J  t  t     o  o 

but  it  is  inconsistent  for  the  alternative 

H' :  Kn  holds  but  K'  does  not. 

It  also  essentially  reauires  that  a„.  be  the  NLS  estimator. 

i 

The  Nevey-Taucher.-White  regression  fcr  the  same  problem  is 

1   or.   u  V  m  ,   u  A       t=l  ,....T.  (2.7) 

t  Q  t      t  t 

A 

In  general,  R'  is  also  required  for  TP. i  from  this  regression  to  be 

2 
asmDtoticaily  x^>  although  there  are  some  cases,  such  as  testine  for  serial 
Q 


correlation  in  a  static  regression  model  with  static  conditional 
heteroskedasticity  (i.e.  V(y  |x  )  «=  V(y  |z  )),  where  the  NTW  regression  is 

A 

robust.   The  validity  of  the  NTW  procedure  also  generally  relies  on  a  being 
the  NLS  estimator. 

As  pointed  out  by  Pagan  and  Hall  [19],  a  robust  test  is  available  from 
the  regression  (2.5).   The  White  [22]  heteroskedasticity-robust  covariance 
matrix  estimator  can  be  used  to  compute  a  robust  Wald  statistic  for  the 

A  A 

hypothesis  that  A   can  be  excluded  from  the  regression  (2.5).   When  A   is  a 
scalar  this  is  simple  because  a  robust  test  statistic  is  simply  the  robust 

A  A 

t-statistic  on  A  .   When  A   is  a  vector  computation  of  the  robust  Wald 
statistic  is  somewhat  more  complicated  since  it  involves  inversion  of  the 
White  covariance  matrix  estimator  as  well  as  explicit  construction  of  the 
appropriate  quadratic  form.   In  addition,  the  Wald  procedure  is  valid 

A 

essentially  only  when  a      is  the  NLS  estimator. 

The  regression-based  heteroskedasticity-robust  form  of  the  test,  which 
in  addition  is  valid  for  any  7T-consistent  estimator,  is  a  special  case  of 
Example  3.1  discussed  in  section  3. 

As  a  second  example,  consider  testing  for  heteroskedasticity.   The  null 

hypothesis  is  taken  to  be 

2  2 

H    •    E(yjx    )    =  m    (x    ,a    )    and  V(v    |x   )    -  a    ,    a     e  A,    c     >   0,    t=l , 2 , . . . . 

b  i_        t  ttO  t        t  00  0 

Again  let  u^(y  ,x^,a)  be  the  residual  function,  and  let  A(x_,£,7r)  be  a  IxQ 

_  2 

vector  or  Heteroskedasticity  indicators,  where  6    =    (a'  ,a   )'   .   A  general  class 

of  tests  is  based  or. 
T 


T    I      A^(x^,f„,r_)'  iu_(y_,x^,Q_)  -  O  ] 

^    u   l.   i   1      Z.       Z.       Z.       T       T 


V 


T 

■a      J-  A        A  r.  A  f. 

-  t"1  I  y  (u  -  O 
t«l 


A2     -12 
where  a  =  T   X  ur  •   A  standard  LM-type  statistic  is  obtained  from  the 


t=l 

centered  r-squared  from  the  regression 

Art  A 

ut  on  1,  Xt,      t=l T.  (2.8) 

2  2 

TR   is  asymptotically  v  under 

C  L^ 

o  A        2 
H':  Hn  holds  and,  in  addition,  E[(u  )  |x  ]  =  k     >  0,  t=l , 2 , . . . 
0   0  l   t   '  t     o 

where  u  ■  y   -  m^Cx  ,q  ).   Regression  (2.8)  yields  the  "studentized"  version 

of  the  Breusch-Pagan  [2]  test  as  derived  by  Koenker  [16].   The  studentized 

form  of  the  test  is  robust  to  certain  departures  from  normality,  and  it  is 

now  widely  used  in  the  literature  (see,  e.g..  Engle  [8],  Pagan  and  Hall  [19], 

and  Pagan,  Trivedi,  and  Hall  [20]).   Unfortunately,  this  form  of  the  test  is 

not  completely  robust  in  the  sense  defined  in  this  paper.   The  constancy  of 

o  4 
E[(u_)  |x   is  an  auxiliarv  assumption  imposed  under  HA  that  is  required  for 
t     t  J  u 

(2.8)  to  lead  to  a  valid  test.   Normality  of  u  conditional  on  x  rules  out 

t  t 

heterokurtosis  under  H_ ,  but  it  is  easy  to  construct  examples  to  illustrate 

that  the  auxiliary  assumption  of  homokurtosis  is  binding.   If  the  regression 

o  .  ,.  .        

errors  u_  nave  a  conditional  t- distribution  witr.  constant  variance  but 

decrees  of  freedom  that  otherwise  deoend  on  x   then  K_  holds  but  K'    does  not. 

t       0  0 


Hsieh  [14]  and  Pagan  and  Hall  [19]  have  noted  that  just  as  with  conditional 
mean  tests,  the  white  [22]  covariance  matrix  car.  be  used  tc  compute 
heteroskedasticity  tests  that  are  robust  tc  heterokurtosis.   However,  except 
when  A^  is  a  scalar,  computation  of  the  statistic  requires  several  matrix 


operations.   Pagan,  Trivedi,  and  Hall  [20]  report  the  White  [22]  t-statistic 

A 

in  a  model  for  the  variance  of  inflation  when  X      is  a  scalar.   An  alternative 
robust  form  of  the  test  is  provided  in  Example  3.2  in  section  3.   It  is 

A. 

almost  as  easy  to  compute  as  the  nonrobust  form  even  when  A   is  a  vector,  but 
it  allows  for  heterokurtosis  under  the  null. 

There  are  other  examples  where  the  goal  is  to  test  hypotheses  about 
certain  aspects  of  a  conditional  distribution  but  auxiliary  assumptions  are 
maintained  under  the  null  hypothesis  in  order  to  obtain  a  simple 
regression-based  test.   Because  the  limiting  distributions  of  test  statistics 
are  usually  sensitive  to  violations  of  the  auxiliary  assumptions,  it  is 
important  to  use  robust  forms  of  tests  for  which  Hn  includes  only  the 
hypotheses  of  interest.   To  be  attractive  these  tests  must  be  easy  to  compute 
under  reasonably  broad  circumstances.   The  remainder  of  this  section  develops 
a  general  approach  to  constructing  robust,  regression-based  tests. 

Many  specification  tests,  including  those  for  conditional  means  and 

variances,  have  asymptotically  equivalent  versions  that  can  be  derived  as 

follows.   Let  ^>,_(y^,x  ,8)   be  an  Lxl  random  function  defined  on  a  parameter 
t   t   t 

p 
set  9  C  R  .   The  null  hypothesis  of  interest  is  expressed  as 

H  •  El©  (v  ,x  ,8    )|x  ]  «  0,  for  some  S      €  9,   t-1,2, (2.9) 

U     '  t  "  t   t   O    t '  o 

By  definition,  S      is  the  "true"  parameter  vector  under  H„.   Because  the  null 
o  -  0 

hypothesis  specifies  that  the  conditional  expectation  of  c  (y  .x  , 6    )  given 

_ne  prece tcnrinSw  VansD^6s  x_  is  zero,  it  is  natural  tc  Cci.  c      a 

For  the  conditional  near,  tests  in  a  nonlinear 
regression  model,  1=1,  6   =  a,    and  a  (y^,x  , 0)  ■  u„(y  ,x  ,o)  =   y   - 


10 


2 
m  (x  ,a).   The  tests  for  heteroskedasticity  take  L  ■  1,  8   =  (Q'  >a    )'  ,    and 

2       2 
<^t(yt.x  .0)  E  ut(a)  -  a  . 

The  validity  of  (2.9)  can  be  tested  by  choosing  functions  of  the 

predetermined  variables  x  and  checking  whether  the  sample  covariances 


:om 


between  these  functions  and  <f>    (y  ,x  ,6    )  are  significantly  different  frc 
zero.   In  order  to  cover  a  broad  range  of  circumstances  that  are  of  interest 
to  economists ,  it  is  useful  to  allow  the  misspecif ication  indicators  to 
depend  on  8    and  some  nuisance  parameters.   Let  n   G  IT  denote  a  Nxl  vector  of 
nuisance  parameters.   Let  A  (x  ,8,n)   be  an  LxQ  matrix  of  misspecif ication 
indicators  and  let  C  (x  ,8  ,ir)    be  an  LxL,  symmetric  and  positive  semi-definite 

weighting  matrix.   Assume  the  availability  of  an  estimator  8      such  that 

1/2  " 
T  '  (#„  -  8    )  -=  0  (1)  under  H_.   Also  assume  that  the  nuisance  parameter 
1    o     p  U 

estimator  jr_  is  such  that  T    (7r_  -  tt_)  ■=  0  (1)  under  Hrt,  where  (?r _: 
T  T    T     p  u  T 

A 

T=l ,  2 ,  .  .  .  }  is  a  nonstochastic  sequence  in  II.   It  is  because  r  need  not  have 
an  interpretable  probability  limit  under  H_  that  n   is  called  a  nuisance 
parameter . 

A  computable  test  statistic  is  the  Qxl  vector 

-,     1       AAA 

T   l     A'CA  (2-10) 

t-1   u  u  " 

A  A       A 

where  "*"    denotes  that  each  function  is  evaluated  at  8      or  (6',-k')' 

1      1   x 

( dependence  of  the  summaries  in  (2.10)  or.  the  sample  size  T  is  suppressed  for 
convenience) .   For  the  conditional  mean  tests  and  the  heteroskedasticity 
tests,  A_(x_,£,7r)  is  the  IxQ  vector  denoted  ^(x^,  8  ,?r)  . 

From  the  point  of  view  of  simply  obtaining  tests  with  known  asymptotic 

size  under  H   the  p.s.d.  matrix  C   could  be  absorbed  into  A  .   But  the 
0       "  t  t 


structure  in  (2.10)  is  exploited  below  to  generate  regression-based  tests 
with  the  additional  property  that  they  are  asymptotically  equivalent  to 
better  known  tests  in  classical  circumstances.   In  the  examples  discussed 
thus  far  C  (x  ,ff,n)    =  1.   Section  3  covers  some  cases  where  it  is  profitable 
to  allow  C   to  be  random. 

To  use  (2.10)  as  the  basis  for  a  test  of  (2.9),  the  limiting 
distribution  of 


AAA 


T 

£_  m     t"1/2  V  A'C  6  (2.11) 

ST  £   t  t*t 

A 

under  Hn  is  needed.   In  general,  finding  the  asymptotic  distribution  of  £ 
under  H   entails  finding  the  limiting  distribution  of 

C     -     T"1/2  I     AC'CV  (2.12) 

t«l 

(values  with  "o"  superscripts  are  evaluated  at  6      or  (f  , -k   '  ■)'  )  and  the 

1/2  A  1/2  A 

limiting  distribution  of  T    (8      -    6    )  (the  limiting  distribution  of  T    (.n 

A 

-  n    )  does  not  affect  the  limiting  distribution  of  £   under  H^) .   Because  £ 
is  the  standardized  sum  of  a  vector  martingale  difference  sequence  under  Hn , 
its  limiting  distribution  is  generally  derivable  from  a  central  limit  theorem 
(provided  that  {A^'C^<p^}  is  also  weakly  dependent  in  an  appropriate  sense). 


L.  l_   l_ 


1/2  A 
In  standard  cases  T    (.6-   -    6    )    will  also  be  asymptotically  normal.   Given 

o      1/2  A 
the  asymptotic  covariance  matrices  of  £T  and  T    (0   -  8    )  and 

differentiability  assumptions  on  A^_,  C_ ,  and  e^.  it  is  possible  to  derive  the 

asymptotic  covariance  matrix  of  £,,,  by  a  standard  mean  value  expansion.   In 

principle,  deriving  a  quadratic  form  in  £„,  which  has  an  asymptotic  chi-square 

distribution  is  straightforward.   But  nothing  guarantees  that  the  resulting 

test  statistic  is  easy  to  compute. 


In  certain  instances  test  statistics  based  on  £   can  be  computed  from 
simple  OLS  regressions.   The  Newey-Tauchen-White  approach  can  be  applied  when 

a 

#   is  the  maximum  likelihood  estimator  and  the  conditional  density  of  y 

A  A  A 

given  x^  is  correctly  specified  under  H_.   In  addition  to  <j>    ,    A  ,  and  C   the 

A 

score  s   of  the  conditional  log-likelihood  is  needed  for  computation.   The 
NTW  regression  is  simply 

A  AAA 

1  on   S{_,  4>'tC  A    ,         t=l,...,T  (2.13) 

2  2 

and  one  uses  TR   as  asymptotically  v  •   If  interest  lies  in  the  case  where 

A 

the  entire  conditional  density  is  correctly  specified  under  H   and  8      is  the 
maximum  likelihood  estimator  of  8    ,  then  the  Newey-Tauchen-White  approach  is 
computationally  easier  than  the  present  approach.   It  should  be  noted, 

A 

however,  that  the  NTW  regression  is  valid  only  when  8      is  the  MLE ,  whereas 

A 

the  procedure  described  below  is  valid  when  8      is  any  Jl- consistent  estimator 
of  6    .      Wooldridge  [31]  discusses  a  C(q)  version  of  the  NTW  statistic  that 

A 

allows  8      to  be  any  ,/T- consistent  estimator.   Another  possible  drawback  to 
the  NTW  regression  is  that  there  is  growing  evidence  that  it  can  yield  tests 
with  poor  finite  sample  properties  even  in  the  best  possible  circumstances 
(Davidson  and  MacKinnon  [6],  Bollerslev  and  Wooldridge  [1] ,  Kennan  and 
Neumann  [15]).   This  is  at  least  in  part  because  the  NTW  regression  ignores 
the  generalized  residual  structure  in  (2.2)  in  always  using  the  outer  product 
of  the  gradient  in  computing  an  estimate  of  the  information  matrix. 

A  relatively  simple  statistic  that  typically  imposes  fewer  assumptions 

A 

than  the  NTW  approach  is  available  if  |„  is  appropriately  modified.   Assume 
that  8      G  int(9)  and  that  <?_  is  differentiable  on  int(S) .   Define  $  (x  ,  6    )    = 


1 1. 


E[Vfl^  (y  >x  >#  )|x  ]•   Then,  instead  of  basing  a  test  statistic  on  the 

v     t   t   t   O     t 

A1/2A 
covariance  of  the  weighted  misspecif ication  indicator  C   A   and  the  weighted 

"1/2"  A1/2A 

generalized  residuals  C   <f>    ,  the  idea  is  to  first  purge  from  C    A   its 

A  ■*   *rt  A  A  A 

linear  projection  onto  C   $  ,  where  $  =  $  (x  ,  0„).   That  is,  consider  the 
f   j  t   t         t    t  t'  T' 

modified  statistic 


~   t   L  t     t  TJ   t   rt 


where 


B„ 


1    A    A    A 

y  *'c  $ 
^,  t  1 1 


t-i 


1   T   A   A   A 

y  *'c  a 
A  c  t  z 


(2.14) 


(2.15) 


is  the  PxQ  matrix  of  regression  coefficients  from  the  matrix  regression 

:i/2 " 


1/2 
C/  At   on  Ct'  *t   t-1,.  .,T. 


(2.16) 


|   can  be  written  more  concisely  as 

*i  -  T"1/2  JA't 

t=l 

:i/2: 


(2.17) 


where  A  =  C    [A   -  $  B  ],  t=l T  are  the  LxQ  matrix  residuals  from  the 

A1/2A 
regression  (2.16)  and  4>     m   C   <}>    .   Note  that  by  construction  A   is  weighted 

t    t   t  t 

by  C1/2. 


It  is  important  to  realize  that  £   and  £   are  not  always  asymptotically 

A  A 

eouivalent  in  the  sense  that  £_  -  £_  -*  0  under  H„ .   The  indicators  A   and 

r   i        o  t 

A  A    A 

[A   -  $  B_]  generally  vield  tests  with  different  power  functions. 

t        L.  I 

Nevertheless,  the  robust  form  of  the  test  almost  always  has  a  straightforward 
interpretation.   I  return  to  this  issue  below. 

A 

Ever,  when  ~^,   and  £„.  are  not  asympotically  equivalent  £„.  can  be  used  as 
the  basis  for  a  useful  specification  test.   The  computational  simplicity  of  a 
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limiting  x      quadratic  form  in  f   is  a  consequence  of  the  following  theorem. 


Theorem  2.1:   Assume  that  the  following  conditions  hold  under  H 

(i)   Regularity  conditions  A.l  in  the  appendix; 

(ii)   For  some  6      e   int(6) , 
o 

(a)   E[*t(yt,xt,0o)|xt]  -  0,  t-1 , 2 , . . . ; 

(b)   *t(vV  "  E[V(yt,xf  V|xt]'  t=1>2----; 

(c)   T1/2(0   -  6    )    =  0(1),  T1/2(tt_  -  tt°)  ■=  0  (1). 
lop  Tip 

Then 


where 


In  addition, 


t-=l 


tii    c  c  z 


,0,0  .,^,0,0 

t  TJ   trt 


-1  T 

I   E[$°'C°A°- 


op(l) 


TP2     d      2 

TRu    "*    *Q' 

2 
where  R   is  the  uncentered  r- squared  from  the  regression 


on 


[cy2ijrcy2(l  -  ;jT) 


t-l,...,T 


(2.18) 


(2.19) 


and  B   is  given  by  (2.15) 


Equation  (2.18)  has  a  very  useful  interpretation.   Viewing  |_  as  a 


action  of  ?,  ?r,  and  E  evaluated  at  the  estimators  6^,,    r.^, 


:nH  B 


e  Gua  t  icr. 


(2. IS)  demonstrates  that  the  asymptotic  distribution  of  this  vector  is 

id   when  the  estimators  are  replaced  by  their  probability  limits.   Note 
;  original  statistic  {^  does  not  generally  nave  this  property. 


un.cn  £ 
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Theorem  (2.1)  can  be  applied  as  follows: 

A  A  AAA  A 

(1)  Given  A  ,  C  ,  <f>    ,  6-   and  n _,  compute  A  ,  C   tf>    ,    and  $  .   Define 

LL-tJ.  i-  t-       l»       w  I— 

At  =  ?/%    *t  -  C^,  and  ^  =  C^; 

(2)  Run  the  matrix  regression 

A   on  *     t=l,...,T  (2.20) 

and  save  the  residuals,  say  A  ; 

(3)  Run  the  regression 

1   on  £'tAt      t-1 T 

2  2 

and  use  TR  -  T  -  RSS  as  asymptotically  xn   under  H   assuming  that  A   does 

not  contain  redundant  indicators . 


It  must  be  emphasized  that  condition  (ii.b),  which  requires  that 
EfV  <j>    (y  ,x  ,6    )|x  "  be  computable  under  the  null  hvpothesis,  can  impose 

U      c   t    t   O     c 

additional  restrictions  on  $      that  must  be  satisfied  in  order  for  (l)-(3)  to 

be  a  valid  procedure  under  Hn.   If  additional  assumptions  are  used  in  forming 

$  (x  ,8    )  then  the  "implicit  null  hypothesis"  includes  more  than  just  (2.9). 
t   i_   o 

But  as  shown  in  Wooldridge  [30],  $^  is  always  computable  under  the  relevant 
null  hypothesis  for  conditional  mean  (hence  conditional  probability)  or 
conditional  variance  testing  in  a  linear  exponential  family.   These  are 
leading  -  but  certainly  not  the  only  -  cases  where  one  would  like  to  be 
robust  agair.st  other  distributional  misspecifications .   Example  3.3  in 
section  3  snows  that  no  auxiliary  assumptions  are  needed  to  compute 
regression-based  specification  tests  of  jointly  parameterized  mean  and 
variance  functions  that  are  robust  to  nonnormalitv. 
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In  many  other  situations  $  (x  , 8    )  is  easily  computed  if  some  additional 

and  in  many  cases  standard  assumptions  are  imposed  under  H     For  example,  in 

3 
the  nonlinear  regression  example  suppose  that  4>    (y  ,x  ,6)    E  [y   -  m  (x  ,a)] 

where  6    contains  a  and  any  conditional  variance  parameters.   $  (6    )  is  easily 

seen  to  be  $  (6    )  -=  -3V  m  (a   )v(y  |x  ).   Most  tests  for  skewness  in  the 
to       ato    tt 

literature  impose  homoskedasticity  or  some  other  conditional  variance 
assumption  under  the  null,  and  <J>  (8    )  is  readily  computed  once  a  model  for 
V(y  |x  )  has  been  specified.   Tests  for  skewness  are  typically  carried  out 
after  the  first  two  moments  are  thought  to  be  correctly  specified.   If  this 
is  the  case,  Theorem  2.1  imposes  no  auxiliary  assumptions  under  the  null. 
However,  it  should  be  noted  that  the  choice  of  A^_  is  limited  in  this  example 

L 

A.  A 

by  the  form  of  $  .   If  A   is  linearly  related  to  $   then  the  modified 
J  t       t  J  t 

indicator  A   is  simply  zero.   In  a  linear  model  with  conditional 

A  A 

homoskedasticity  and  repressors  w  ,  $   is  proportional  to  w  .   Thus ,  A 

cannot  contain  linear  combinations  of  w  .   In  particular,  if  w  contains 

t      F  t 

unity  then  the  choice  At  =  1  is  unavailable;  this  rules  out  a  standard  test 

for  unconditional  skewness  based  on  X  u--   ^  Newey-Tauchen-White  test  would 

t=l  Z 

allow  more  flexibility  in  the  choice  of  A^  for  this  example. 

As  another  example,  consider  testing  for  nonconstancy  of  the  conditional 

first  absolute  moment  of  the  regression  errors.   Under  H„ , 

E[jy^  ■  n^(>-_.Q  )  |  jx  ]  —  n     >  C.   The  generalized  residual  is  <j>    (y^.x^.tf)  = 

!y_  -  m_(x_.a)|  -  k   where  9   =    (a' ,*)' .   Although  e_(f)  is  not  strictly 

differentiabie  in  a,  it  is  differentiabie  almost  surely  under  the  usual 

assumptions  imposed  in  these  contexts.   The  quasi -gradient  with  respect  to  a 

is   V  6    (6)    -    (liy    -m    (a   )   >  0]    -    lfy    -m    (a    )    <   0])V  m    (a    ) .      Under 
at  tto  tto  'ato 


conditional  symmetry  of  the  distribution  of  y   given  x 

E(l[yt-mt(ao)  >  0]|xt)  =  E(l[yt  -  mt(oo)  <  0]  |xt) ,  so  that  E[V^t(«o)  |xt]  = 

0.   Also,  E[V  <j>    (8    )|x  ]  =  1,  and  *  (x  ,  6    )  is  simply  (0,1).   If  m   is  the 
/clol  t   t   o  l. 

A 

conditional  mean  function  and  a      is  an  M- estimator  other  than  the  NLS 
estimator  (e.g.  the  least  absolute  deviations  estimator)  then  conditional 

A 

symmetry  is  needed  anyway  for  a      to  be  consistent  for  a  . 

Assumption  (ii.c)  is  perhaps  more  properly  listed  as  a  regularity 
condition,  but  it  is  placed  in  the  text  to  emphasize  the  generality  of 
Theorem  2.1.   Having  ./T- cons  is  tent  estimators  of  8      and  ix      is  a  fairly  weak 
requirement,  and  allows  relatively  simple  specification  tests  when  8      (as 
well  as  n    )  has-been  estimated  by  an-  inefficient  procedure  (under  classical 
assumptions).   An  application  to  the  tobit  model  is  given  in  section  3.   The 
tobit  example  has  the  feature  of  imposing  correct  specification  of  the 
conditional  distribution  under  Hn  but,  unlike  the  usual  LM  or  NTW 
regressions,  Theorem  2.1  can  be  used  when  an  estimator  other  than  the  MLE  is 
available . 

A  yet  unresolved  issue  is  the  relationship  between  £   and  £  .   There  is 
a  simple  characterization  of  their  asymptotic  equivalence  under  Hn .   The 
proof  of  the  following  lemma  follows  immediately  from  the  construction  of  |  . 

Lemma  2.2:   Let  the  conditions  of  Theorem  2.1  hold.   If,  in  addition, 

T 

I  / 1-\      —  a  a  a 

(iii)  l'l/l   I  VC^     -   0(1), 
t-1  t  u  u      p 


tnen 


CT  -  £T  =  op(l),  (2.21) 
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When  (iii)  holds,  £   and  f   are  asymptotically  equivalent  under  H_ . 

Condition  (iii)  is  usefully  interpreted  as  the  sample  covariance  between 

"1/2A  A1/2A 

(C  '  $  :  t=l,..,T)  and  (C  '  tf>    :  t=l,...,T)  being  asymptotically  zero.   It  is 

trivially  satisfied  if 

T 

X  *.(*)'C  (0,»r  U  (0)  -  0  (2.22) 

t-1  x 

is  the  defining  first-order  condition  for  6    .      This  is  frequently  the  case, 

A 

in  particular  when  6      is  a  quasi -maximum  likelihood  estimator  (QMLE)  of  the 
parameters  of  a  conditional  mean  (see  Wooldridge  [30])  or  of  the  parameters 
of  a  jointly  parameterized  conditional  mean  and  conditional  variance  (see 
Example  3.3  below).   In  these  examples  (2.21)  also  holds  (trivially)  for 
local  alternatives,  so  that  the  difference  between  the  test  based  on  £   and, 

A 

say,  the  NTW  test  based  on  £  ,  is  simply  that  different  estimators  have  been 
used  for  the  moment  matrices  appearing  in  the  quadratic  form.   Consequently, 
under  the  conditions  required  for  the  classical  test  to  be  valid,  the  two 
procedures  are  asymptotically  equivalent  under  local  alternatives ;  robustness 
is  achieved  without  losing  asymptotic  efficiency.   In  addition  to  having 
known  asymptotic  size  under  H_ ,  the  robust  test  has  a  limiting  noncentral 
chi-square  distribution  even  when  the  auxiliary  assumptions  are  violated 
under  local  alternatives  (e.g.  heteroskedasticity  is  present  in  a  dynamic 
regression  model). 

Lemma  2.2  does  not  directly  provide  a  description  of  the  local  behavior 
of  £„  when  (iii)  fails  tc  held  under  local  alternatives,  but  viewed  from  a 
slightly  different  angle  it  provides  useful  insight.   Note  that  Theorem  2.1 
implies  that  the  quadratic  form  in  £  has  an  asymptotic  chi-square 
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distribution  under  Hn  regardless  of  whether  or  not  (iii)  holds;  the  issue  is 
how  to  characterize  the  directions  of  misspecif ication  that  £  has  power 
against  when  (iii)  does  not  hold.   Fortunately,  it  is  frequently  the  case 
that  £   is  asymptotically  equivalent  to  some  well-known  statistic  under  local 
alternatives,  when  classical  assumptions  hold.   This  facilitates  interpreting 
a  rejection  when  (iii)  fails  to  hold. 

To  characterize  the  local  behavior  of  f  ,  it  is  useful  to  be  somewhat 
more  explicit  about  the  nature  of  the  local  alternatives.   Let  6      and  ■n     be 

/      *  /      *        O 

nonstochastic  sequences  such  that  JT(6      -  8    )  -  0(1)  and  jT(n      -    n    )    -  0(1). 
{#„,:  T— 1,2,...}  indexes  the  sequence  of  local  alternatives,  but,  as  with  6 
under  H_ ,  6      need  not  uniquely  index  the  nonnull  probability  measure.  k      is 
the  plim  of  the  estimator  n     under  the  sequence  of  local  alternatives  (H  , : 
T— 1,2,...}.   Assume  that  the  conditions  of  Theorem  2.1  are  supplemented  with 
conditions  of  the  form 


T  .   ,  ^        r     ,  T 


% 


f-li  *  *  1       f  -l  i,  ol 

v    t=l  J  o  v    t=l 


0 


as  T  -<■  co  for  various  functions  G  .   This  corresponds  to  standard  assumptions 
in  the  analysis  of  the  local  behavior  of  test  statistics.   The  arguments  of 
Theorem  2.1  can  be  used  to  show  that  under  the  sequence  of  local  alternatives 


Tl- 


-1/2  £   *     *  *   *  * 
i~   -  T  7  I    [A_  -   $_BT]'  C^   +   o^(l)  (2.23) 


where 

*     !i    *  *  *  I  "x  i    *■*■* 
BT  =   ^  El£   ^-*-J     I   EL**  Ci-AJ 

Lt-i  j    t-i 

and  values  with  a  "*"  superscript  are  evaluated  at  0   or  (0  ,?r  ).   Equation 
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(2.23)  is  the  extension  of  (2.18)  to  local  alternatives  and  implies  that  the 

A  A 

local  limiting  distribution  of  £   is  the  same  when  8      and  ■n      are  replaced  by 
their  plims  6^.   and  n^,    provided  that  JT(8T   -  6    )    ■=  0  (1)  and  VT^  -  t^)  ■= 

A        A  A        A 

0(1)    under    (H^)  .      This    implies    that   if    (8Tl>*Tl)    and    (^T2 ' 7TT2 ^    are  b°th 
,/T-consistent   estimators    of    (6    ,x    )    under    {H   .  }    then 

*T1  "  ^T2  "  °p(1)  (2-24) 

A         A 

under  (H   ) ,  where  |  .  is  evaluated  at  (6      ,n      )    and  £    is  evaluated  at 

A         A  A  A 

(0  „,7r  „).   Suppose,  in  addition,  that  6    „  and  w   are  chosen  to  satisfy 

(iii),  i.e. 

T 
T*1/2tI^t(^2)'Ct(JT2,;T2)^(?T2)  =  op(l).  (2.25) 

A 

Then,  by  the  analog  of  Lemma  2.2  for  local  alternatives,  £  9  -  £  „  «=  o  (1) 

A  A         A 

where  £    is  evaluated  at  (6  9,7r   ).   Along  with  (2.24)  this  implies  that 

*T1  ■  ^T2  =  °p(1)  (2'26) 

under  Hn  and  local  alternatives.   Conclusion  (2.26)  is  simple  yet  very 

A 

powerful.   It  means  that  for  any  Vi-consistent  estimator  6 _..  ,  £_..  is 

A  A 

asymptotically  equivalent  to  £  „  because  £„,„  has  been  evaluated  at  an 
estimator  that  satisfies  the  asymptotic  first  order  condition  (2.22). 
Whenever  such  an  estimator  is  available  the  interpretation  of  £T  is 
straightforward:   £   is  asymptotically  equivalent  to  the  vector  that 
originally  motivated  the  test  statistic,  £_,  when  |„  is  evaluated  at  the 
estimator  that  solves  the  first  order  condition.   It  does  not  matter  which 
estimator  is  used  in  computing  £  ,  provided  that  it  is  JI-co-sLs ter.t.   Thus, 
the  interpretation  of  |   does  not  depend  on  the  estimator  used  in  computing 
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it.   In  many  situations  there  is  available  an  estimator  that  satisfies 
(2.22),  and  typically  it  solves  a  well-known  problem.   Interpreting  £   even 
whether  or  not  (iii)  holds  typically  reduces  to  interpreting  an  LM-type 
statistic  in  a  particular  weighted  nonlinear  regression  model  or  in  a  model 
estimated  by  MLE  under  normality.   An  example  of  a  case  where  an  estimator 
satisfying  (2.22)  does  not  have  a  simple  interpretation  involves  testing  for 
skewness  as  discussed  above.   In  a  linear  model  with  homoskedasticity ,  the 
estimator  that  solves  the  first  order  condition  (2.22)  sets  the  correlation 
between  the  regressors  and  the  third  moment  of  the  errors  equal  to  zero. 
This  method  of  moments  estimator  is  not  particularly  easy  to  interpret. 

The  reasoning  of  the  previous  paragraph  is  applied  to  the  tobit  model  in 
the  next  section.   There  it  is  seen  that  a  test  for  the  conditional  mean 
using,  e.g.,  Heckman's  [12]  two-step  estimator,  is  asymptotically  equivalent 
to  a  standard  Davidson-MacKinnon  [5]  test  for  comparing  two  readily 
interpretable  weighted  nonlinear  regression  models. 

The  results  of  Theorem  2.1  and  Lemma  2.2  are  asymptotic.   Very  little  is 
known  about  the  finite  sample  performance  of  the  statistics  of  Theorem  2.1, 
especially  for  nonlinear  dynamic  models.   It  should  be  emphasized,  however, 
that  even  though  the  regression  in  step  (3)  uses  unity  as  the  dependent 
variable,  these  statistics  do  not  necessarily  have  the  same  finite  sample 
biases  sometimes  exhibited  by  outer  product-type  regressions.   Unlike 
standard  outer  product  regressions,  the  robust  form  does  exploit  the 
generalized  residual  form  cf  the  test  statistic.   In  fact,  the  simulations  of 
Davidson  and  MacKinnon  [6]  for  a  static  regression  model  and  of  Bollerslev 
and  wooidridge  [1]  for  an  AR-GARCH  model  suggest  that  the  orthogonalization 
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A1/2A  A1/2A 

of  C   A  with  respect  to  C   4>   in  step  (2)  improves  the  finite  sample 

performance  relative  to  the  NTW  outer  product  regression,  even  under 

classical  assumptions.   That  this  might  be  the  case  was  previously  suggested 

to  me  by  Peter  Phillips. 

3 .  Examples  of  Robust .  Regress  ion- Based  Tests 

p 
Example  3.1:   Let  y  be  a  scalar  and  let  (m  (x  ,q) :  a  G  A) ,  A  c  R  ,   be  a 

parametric  family  for  the  conditional  expectation  of  y   given  x  .   The  null 

hypothesis  is 

HQ:  E(yt|xt)  -  mt(xt ,qq) ,  some  aQ  e  A,  t-1 , 2 (3.1) 

Let  {h  (x  7)  :  7  e  T)   be  a  sequence  of  weighting  functions  such  that  h  (x  ,7) 
>  0,  and  suppose  that  7   is  an  estimator  such  that  T    (7   -  7  )  =  0  (1), 

where  {7  }  C  T.      It  is  not  assumed  that  (h  (x  ,7):  7  £  T)    contains  a  version 

i  t   t 

of  V(y  |x  )  or  that  h  (x  ,7  )  is  proportional  to  V(v  |x  )  for  some  7  e  T    . 
J t'    t  t   to     F   F  Jt'    t  o 

The  researcher  chooses  a  set  of  weights  (h  (x  ,7  )]  and  performs  weighted  NLS 

(WNLS) ,  or  uses  some  other  7T-consistent  estimator  for  a  .   However,  no 

o 

matter  which  estimator  for  a   is  used,  the  tests  are  motivated  bv  the  WNLS 

o 

first   order   condition 

T 

I  V  mfc(Q)'  [y      -    iMq)]/M7t)    -   0.  (3.2) 

t_1   a    .  t  t  u      T 

A  general  class  of  diagnostics  is  obtained  by  replacing  7  m_(a)  with  a  1x0 

vector  of  cisspecif icatior.  indicators  evaluated  at  the  estimators : 

T 

±  A       A  A  A 

l   At(aT,7rT)'  [yt  -  mt(aT)]/ht(7T)  (3.3) 
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where  ■n      can  contain  7  and  other  nuisance  parameters .   In  the  notation  of 

Theorem  2.1,  6   =  a,    <t>Ae)    m   vt  '   mt(Q).  A  ({,1)  =  A  (a.jr),  and  Ct(0,7r)  = 

1/h  (7)-   It  is  easy  to  see  that  computation  of  $  (x  , 6    )  requires  no 

auxiliary  assumptions  under  H_ ,  and  in  fact  $  (x  .0  )  ■  -V  ni  (x  .0.  ). 
J  u  tto        QttO 

2 
The  usual  LM-type  statistic,  which  is  TR   from  the  regression 

A        A  A        A  A        A 

u  A/h.   on  V  m  /7h    A//h,   t-l,...,T, 

t    t        a    t    t     t    t 

A 

requires  that  a   be  asymptotically  equivalent  to  the  WNLS  estimator  and  that 

A 

h  (7  )  be  a  consistent  estimator  of  V(y  |x  )  up  to  scale.   The  following 

A 

procedure  is  valid  under  Hn  for  any  ,/T- cons  is  tent  estimator  a        without  any 
assumptions  about  V(y  |x  ): 

A 

(i)    Let  q     be   a  ,/T-consistent   estimator   of  a    .      Compute    the    residuals 

A  A  A       A  A^*/-\A 

u  ,  the  gradient  V  m  (cO  ,  and  the  indicator  A  (a_,7r_).   Define  u  =  h    u  , 
t      b  a  t  T  t  T  T  t    t    t 


■/2„  :     _  ~x  m  ;-i/2: 

t      t 


V^m^  =  h^  '  Vjn^,  and  A^  e  h„  '  X    ; 


(ii)  Regress  A   on  V  m  and  save  the  lxQ  residuals,  say  A  ; 

-  ••  2 

(iii)  Regress  1  on  u  A   and  use  TR  ■=  T  -  RSS  from  this  regression  as 

2 
asymptotically  xn   under  H_. 

A 

This  procedure  with  h  =  1  was  first  proposed  by  Davidson  and  MacKinnon 
[6]  in  the  context  of  a  nonlinear  regression  model  with  independent  errors 
and  unconditional  heteroskedasticity .   It  was  independently  suggested  by 
Vooldridge  [29]  for  nonlinear,  possiblv  dynamic  regression  models  with 
conditional  or  unconditional  heteroskedasticiry  under  a  martingale  difference 
assumption  on  the  regression  errors.   Theorem  2.1  further  demonstrates  that 

A  A 

a     need  not  be  the  NLS  estimator.   The  indicator  A^  can  be  chosen  to  yield  LM 
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tests,  Hausman  tests  based  on  two  WNLS  regressions,  and  tests  of  nonnested 
hypotheses ,  such  as  the  Davidson-MacKinnon  [5]  test,  which  do  not  require 
correct  specification  of  the  conditional  variance  of  y   given  x  . 
Conditional  mean  tests  in  the  more  general  context  of  multivariate  linear 
exponential  families  are  considered  in  more  detail  in  Wooldridge  [30]. 

The  estimator  that  satisfies  condition  (iii)  of  Lemma  2.2  is  the  WNLS 

A 

estimator  based  on  weights  1/h  (7„) .   From  the  remarks  following  Lemma  2.2, 
the  robust  test  statistic  employing  any  ,/T-consistent  estimator  is 
asymptotically  equivalent  to  the  LM  statistic  based  on  (3.3)  when  (3.3)  is 

A 

evaluated  at  the  WNLS  estimator,  h  (7  )  is  proportional  to  V(y  |x  ),  and  7 

is  a  ,/T-consistent-estimator  of  7  .   For  efficiency  reasons  it  is  prudent  to 

o 

put  some  thought  into  the  choice  of  h  . 

Example  3.2:   Suppose  now,  in  the  context  of  Example  3.1,  the  goal  is  to  test 

whether  for  some  7  G  T,    h  (x  ,7  )  is  proportional  to  V(y  |x  ).   Let  v  (x  7) 

2  2 

=  a   h  (x  ,7)  where  a      is  absorbed  into  7.   The  null  hypothesis  is 

H0:  E(yt|xt)  =  m^x^),  V(yt|xt)  -=  VVV'  %  e  k'  (3"4) 

7  e  T,  t-1,2,.... 

o 

A  A 

Let  a      be  the  WNLS  or  some  other  ,/T-consistent  estimator  of  a  ,  and  let  7T  be 
any  Jl- consistent  estimator  of  7  .   Let  A  (x  ,6  ,t)    be  a  lxQ  vector  of 
indicators  where  6   =    (a'   ,7')' .   Most  tests  for  variances  can  be  derived  from 


a  statistic  or  tne  rom 

T    A 


.1 


"2 


I  Kl<    -   vj/v_  (3.5) 
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Choosing  A(x  ,8, it)    to  be  the  nonconstant,  nonredundant  elements  of 

vechfV  m  (a)'V  m  (a) ]    leads  to  the  White  [24]  information  matrix  test  in  the 
q  t     q  t 

context  of  quasi-maximum  likelihood  estimation  in  a  linear  exponential 

2 
familiy  (see  Wooldridge  [30]).   When  v  (7)  =  a      choosing  A  (x  ,8,-k)    =  w  , 

where  w   is  a  lxQ  subvector  of  x  ,  leads  to  the  Lagrange  Multiplier  test  for 

a  general  form  of  heteroskedasticity  (see  Breusch  and  Pagan  [2]).   Setting 

2 
A  (x  ,B,n)    ■  (u    (q),...,u  0(q))  gives  Engle's  [8]  test  for  ARCH(Q)  under  a 

null  of  conditional  homoskedasticity. 

The  correspondences  for  Theorem  2.1  are  L  «=  1,  8   =  (a'  ,7')'  ,  <f>    (8)    - 

u*(a)  -  v  (7),  C  {$,*)    -  l/v?(7).   Note  that  V  <j>    ($)    -  -2V  m  (a)u  (a)  - 

V  v    (7).      Under  H        E[u   (a   )|x   ]    =   0   so    that  $    (x      6    )    -   E[V  *    (0    )|x    ]    = 
7    *-  ULOL  uuo  ptou 

-V  v  (7  ) ;  no  additional  assumptions  are  needed  under  H„  to  compute 
7  t   o  0 

t   t   o 

2 
The  choice  C    (8 ,n)    e  1/v  (7)  in  (3.5)  is  motivated  by  the  structure  of 

the  score  of  the  normal  log- likelihood  with  mean  function  m  (a)  and  variance 

"2 
function  v  (7) .   In  particular,  the  scaling  1/v   appears  in  the  the  variance 

tests  of  Godfrey  [10]  and  Breusch  and  Pagan  [3].   The  standard  LM  statistic 

2 
in  this  context  is  TR  from  the  regression 

u  ° 


A       A 


/\  f\  o         A  A       A 

(ut   -   VAV      Vt/V      VV      t=1-----T-  C3.6) 

In  addition   to    (3.4)    this    test   imposes 

£[(*0     |x,_]    -  k    [v4_(x_,7    )]    ,      some   k     >  0  (3.7) 

t-  «-  C        w        ^       o  o 

under  the  null,  so  that  it  is  ncnrobust.   Moreover,  as  pointed  out  by  Breusch 

and  Pagan  [3],  (3.6)  is  generally  valid  only  if  7   is  the  QMLE  of  7  under 

J  T  '  o 

normality.   Breusch  and  Pagan  [3]  offer  a  computationally  simple  C(a)  test 
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that  allows  7   to  be  any  7T- cons is tent  estimator  of  7  ,  but  it  still  requires 
that  (3.7)  hold  under  the  null.   The  NTW  procedure  applied  to  this  case  is 
valid  essentially  in  the  same  cases  as  the  usual  LM  statistic. 

The  robust  procedure  obtained  from  Theorem  2.1  is  easy  to  compute, 

A 

imposes  only  (3.4)  under  the  null,  and  allows  7   to  be  any  Jl- consistent 

estimator  of  7  . 
o 

A  A 

(i)  Let  q_  be  a  ,/T- cons  is  tent  estimator  of  a  ,  and  let  7_  be  a 
1  01 

A 

,/T-consistent  estimator  of  7  .   Compute  the  residuals  u  ,  the  gradient 


A  «    A 


A  AA  A«AAA« 

V  vt(7T),  and  the  indicator  A  (fl   tt  ).   Define  4>     =    (u   -  vt)/vt  -  ut/vt   1' 

A       A  A       A 

V  v  ■  V  v  /v  ,  and  A   ■  A  /v  ; 

7  t    7  V     t'       t    v     t 

(ii)  Regress  A   on  V  v   and  save  the  1x0  residuals,  say  A  ; 
6       t     7  t  x  J      t' 

-  •■  2 

(iii)  Regress  1  on  6   A   and  use  TR  -  T  -  RSS  from  this  regression  as 
6  Yt   t  u  6 

2 
asymptotically  xn   under  H_ . 

2 

Interestingly,  when  v  (x  ,7)  ■  a    ,  so  that  the  null  is  conditional 

homoskedasticity ,  the  regression  in  (ii)  simply  demeans  the  indicators. 

Given  u  ,  a_,  and  a  choice  for  A  ,  the  y^  statistic  is  obtained  as  TR   from 
t   T  t      AQ  u 

the  regression 

A  A       A  -       A 

1  on   (uf-  0(V  -  A_)     t-I,...,T  (3.8) 

L.       1        U         I 

where  AT  =  T   £  A  .   This  procedure  is  asymptotically  equivalent  to  the 

traditional  regression  form  (2.8)  under  the  additional  assumption  that 
E[u^_(a  )  |x  ]  is  constant.   Note  that  (3.6)  and  the  regression  (2.8)  usually 
yield  different  test  statistics  that  are  not  asymptotically  equivalent  under 
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H    The  demeaning  of  the  indicators  may  not  seem  like  much  of  a 
modification,  but  it  yields  an  asymptotically  chi-square  distributed 
statistic  without  the  additional  assumption  of  constant  fourth  moment  for  u  . 
In  the  case  of  the  White  test  in  a  linear  time  series  model,  the  demeaning  of 
the  indicators  yields  a  statistic  which  is  asymptotically  equivalent  to 
Hsieh's  [14]  suggestion  for  a  robust  form  of  the  White  test,  but  the  above 

statistic  is  significantly  easier  to  compute. 

2 
In  the  case  of  the  ARCH  test,  TR   from  the  regression  in  (3.8)  is 

2 
asymptotically  equivalent  to  TR   from  the  regression 

1   on   (VV(Vl"V (VaT)(ut-Q"V   t=Q+1----T-         <3-9) 

The  regression  based  form  in  (3.9)  is  robust  to  departures  from  the 

conditional  normality  assumption,  and  from  any  other  auxiliary  assumptions, 

such  as  constant  conditional  fourth  moment  for  u  .   Nevertheless,  it  is 

t 

asymptotically  equivalent  to  the  usual  ARCH  test  under  normality. 

Example  3.3:   Theorem  2.1  can  also  be  applied  to  models  that  jointly 
parameterize  the  conditional  mean  and  conditional  variance.   Again,  let  y  be 
a  scalar,  and  consider  LM  tests  that  are  robust  to  nonnormality.   The 
unconstrained  conditional  mean  and  variance  functions  are 

(Mt(xt,5),  »t(xt,«):  6   6  A)  (3.10) 

M 

where  A  C  R  .   It  is  assumed  that 

E(yJx    )    -   M^(x_,<5    ),      V(v    |x    )    -  a-    (x    ,6    ),    some    S      €  A.  (3.11) 

tt  tto  t      t  tto  o 

Take  the  null  hypothesis  to  be 

H  •   5   «  r(0  )     for  some  6      e   6  c  RP  (3.12) 

0     o       o  o 
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where  P  <  M  and  r  is  continuously  dif ferentiable  on  int(9) .   Let  m  (6)    ■ 

H    (r(0))  and  v  (0)  -  w  (r(0))  be  the  constrained  mean  and  variance  functions. 

A 

QMLE  is  carried  out  under  the  null  hypothesis.   Let  8      be  the  estimator  of  8 

under  H.  ,  and  let  £„,  =  r(0_)  be  the  constrained  estimator  of  7  .   V„m   and 
0  T      T  'o     8    t 

A  AAA 

V„v     are    the   lxP   gradients   of  m     and  v     under  H_ .      Note    that  u>     =  v     and   u     «= 
8    t  &  ttO  t  t  *t 

A 

m  by  definition.   The  LM  test  of  (3.12)  is  based  on  the  unrestricted  score 
of  the  quasi-log  likelihood  evaluated  at  S    .      The  transpose  of  the  score  is 

st(5)'  -  V^t(S)'ut(£)A>t(0  +  Vsu>t(6)'[u2t(S)    -  wt(fi)]/2u£(«)      (3.13) 


Vt(5) 

lVt(*), 


i/«t(ff)         0 


[   0     l/[2wt(fi)^]J 


ut(5) 


\(S}_-    wt(«)  J 


(3.14) 


Evaluating  s   at  r(S)  gives 


st(r(0))'   -  At(^)'Ct(5)^t(^) 


(3.15) 


where  A  (8)'    =    [V r/i  (r(0))'  |  V.w  (r(0))'],  C  (0)  is  the  diagonal  matrix  in 
t         0  t  0  t  t 

the  middle  of  (3.14)  evaluated  at  v(8)  ,    and  4>    {&)'    =  [u  (r(0)), 

2 
u^(r(£))  -  v  (#)].   The  standardized  score  evaluated  at  r(0  )  is 


T"1/2  I  s  (r(L)) 

t-1 


,-1/2 


t«l  C 


(3.16) 


Under  H„  and  the  assumption  of  conditional  normalitv,  TP."  from  the  regression 
0  •  -     u  e 


.ity,  TR2 


on 


t=l    ,  T 


(3.17) 


is  asvmotioticaliv  v  >  vnere  Q  ■  M  -  ?  is  tne  number  cr  restrictions  under  fa_. 
Unfortunately,  this  procedure  is  invalid  under  nonnormality,  nor  does  it  have 
systematic  power  for  detecting  nonnormality.  Theorem  (2.1)  suggests  a  robust 
form  of  the  test.  In  this  case, 
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2x1 


t 
2x2 


u 

t 

U         -     V 

t        t  ' 


A      = 
t 

2xM 


Vt 

Vt 


l/v. 


.     0  l/[2v£]   J 


t 
2xP 


6   tJ 


where  u  ■  v   -  m  (#„,).   The  transformed  quantities  are 
t    Jt     tv  T'  H 


u  //v 
t/v  t 


>  [u£  -  vt]/(72vt)  , 


k  vt/(y2v 


* 


f  Vrt/7vt 


I  Vt/<V2vt)   J 

The  robust  test  statistic  is  obtained  by  first  running  the  regression 


A   on  $     t=l,...,T  (3.18) 

t       t 

and  saving  the  matrix  residuals  [A  :  t=l,...,T).   Then  run  the  regression 

1    on   t'tAt  t-l,...,T  (3.19) 

2  2 

and  use  TR   =  T  -  RSS  as  asymptotically  x  under  Hn .   Note  that  the 


0" 


regression  in  (3.19)  contains  perfect  multicollinearity  since  A  V  r(8 „)  =   0, 

t  b        i 

where  V  r(6)    is  the  MxP  gradient  of  r.   Many  regression  packages  nevertheless 
compute  a  RSS;  for  those  that  do  not,  ?  regressors  can  be  omitted  from 
(3.19). 

a 

Note  that  the  first  order  condition  for  8—   is  simply 

1a  a  a 

I   *1.(*.P)'C.(Oi  (0.)   -  0, 

t-l  c   A    c  x  u  i 

so  that  the  robust  indicator  is  asymptotically  equivalent  to  the  usual  LM 
indicator.   The  matrix  regression  in  (3.18)  is  the  cost  to  the  researcher  in 


guarding  against  nonnormality . 


Example  3.4:   Suppose  that  y   is  a  random  scalar  censored  below  zero,  and  let 
x   be  a  lxM  vector  of  predetermined  variables  from  x  .   A  popular  model  for 
y   is  the  tobit  model.   Among  other  things,  the  tobit  model  implies  that 


E(y  ly  >0,x  )  —  x  -a  +  a   p(x  ,q  /a    ) 
VJtIJt   '  t'  tl  o     oFV  tl  o'    o' 


-  mt(0o)  (3.20) 


and 


V(ytlyt>0,xt)  -a*(l  -  Cxtlao/ao)p(xtlao/ao)  -  [p^^/a^h 

-  v  (6    )  (3.21) 

t   o 

2 
where  p(-)  is  the  Mill's  ratio  and  8      -    (a  ,a    ) .   Here  a      is  the  conditional 

ooo  o 

variance  usuallv  associated  with  the  underlying  "latent"  variable,  and  w  a 

J  JO  t  0 

is  conditional  mean  of  the  latent  variable.   From  a  statistical  standpoint, 
the  tobit  model  is  no  more  sensible  than 

log  yjyt>0,xt  -   N(xtl£Q,r^)  (3.22) 

((3.22)  also  seems  reasonable  for  many  economic  applications;  see  Cragg  [4]), 

2 
If  (3.22)  is  valid,  B     and  rj      can  be  estimated  bv  OLS  of 

oo 

log  y        on     x  . 

L.  Li 

using  only   the   positive  values   of  y^ .      Recall   that    (3.22)    implies 

i. 

2 
E(y^_  jy^>0  ,x_)    -  exp(rj   /2   +  x  _/5    ). 

2 

c      c 

and 

V(v     ly    >G,X    )    =    [eXD(r?2)     -    I1reXD(r;2/2    +  X    -0    )  ] 
Jt"t  t  '      -o  '■      Ao  tl   o 

-  uAB.A  (3-24) 

too 


A        A  , 


"2  2 

Let  a    ,    a      be  any  VT-consistent  estimators  of  a      and  a      under  HA .   These 
T    T  o      o         0 

include  the  MLE's,  Heckman's  [12]  two-step  estimators,  and  various  WNLS 
estimators.   Define  the  Davidson-MacKinnon  [5]  indicator  to  be  the  following 
weighted  difference  of  the  predicted  values  from  (3.23  and  (3.20): 

A  AA  A«  A  AA  AA 

At   -    (vt/Wt){exp[r^/2   +   X^^]    -    [x^  +  t^p  (x^/a^J  ]  }  (3.25) 

A  A       A  A  A       A  « 

where  v  =  v  (a  ,a    )  and  u     =   w  (j3     r/    )  .      Then,  if  the  tobit  model  is  true, 

A 

A   should  be  asymptotically  uncorrelated  with 

A  A  A  A      A 

u  E  y   -  x  ,a_  -  o^pix   ,a^/a^,).  (3.26) 

t    Jt     tl  T    Tp   tl  T     T 

A  test  which  can  be  shown  to  be  consistent  against  the  alternative  (3.23)  is 

A  A 

based  on  the  weighted  correlation  between  u  and  A  .   Unfortunately,  the 

A 

usual  LM  statistic  is  invalid  even  when  the  weighting  1/,/v   is  employed.   The 


A       A 


reason  is  that  the  estimators  (a     a    )  need  not  have  been  obtained  from  the 
weighted  nonlinear  least  squares  problem 

1  -    A 

min  I  (yt  -  xtlQ  -  ^P(xtlQ/£7))  /vt-  (3.27) 

a , a    t=l 

Nevertheless,  a  statistic  is  available  from  Theorem  2.1.   Let 

q>t(a,o)    =  y      -    xtlQ    -    erp(x   -a/cr)    =   ut(^) 

A 

and  let  V  m^   denote    the    1   x    (M+l)    gradient   of  x   .a  +   ap(x   .a/a)    with   respect 

0        L.  tl  ti 

A       A 

to  a   and  a,    evaluated  at  (a^,o„,)  .      Then  the  following  procedure  is 
asymptotically  valid: 


(i)  Define  A   as  in  (3.25),  u   as  in  (3.26),  and  V  m   as  above. 

A        A  A        A  A        A 

The   weighted   quantities    are    A      =   A    /7v    ,    u      ^  u   /7v    ,    V„m      ■   V„m   /7v    . 
6  M  t  t/vt't  t/v    t'       0    t  0    t/v    t 

(ii)      Run  the   OLS   regression 

At      on     V^mt        t=l T 

and  save  the  residuals  A  . 

t 

(iii)   Run  the  regression 

1   on  u  A    t=l , . . . , T 

t  t      ' 

2  2 

and  use  TR  •=  T  -  RSS  as  asymptotically  y,  under  H_  . 

u  J    r        J       1        0 

This  test  takes  the  null  hypothesis  to  be  correct  specification  of  the 

conditional  density  of  y   given  x  ,  i.e.  the  tobit  model  holds  under  H„.--Tn 
J  Jt  b  t  0 

particular,  it  relies  on  linearity  of  the  conditional  expectation  in  x  . , 

conditional  homoskedasticity ,  and  conditional  normality  in  the  underlying 

latent  variable  model;  it  is  not  intended  to  be  robust  to  departures  from  any 

of  these  assumptions.   Instead  the  test  is  devised  to  have  power  against 

departures  from  the  tobit  model  that  invalidate  the  conditional  expectation 

(3.20).   Equation  (3.20)  is  of  course  only  one  of  many  consequences  of  the 

tobit  model  that  could  be  tested.   The  test  is  most  useful  if  interest  lies 

in  determining  the  effect  of  explanatory  variables  on  positive  values  of  the 

dependent  variable . 

The  discussion  following  Lemma  2.2  implies  that  (i)-(iii)  is 

asymptotically  equivalent  to  the  Davids  on -MacKinnon  test  fcr  weighted  NLS 

estimation  of  (3.20)  and  (3.23),  provided  aT  and  c     are  Jl- cons is tent 

estimators.   If  g„  and  a_  are  the  MLE's  then  these  estimators  are  more 
i      I 

efficient  than  the  WNLS  estimators  that  solve  (3.27).   But  Heckman's  [12]  two 
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step  estimators  yield  a  test  that  is  asymptotically  equivalent  to  the  test 
employing  the  MLE's  or  the  WNLS  estimators.   One  is  essentially  doing 
specification  testing  of  the  nonlinear  regression  model  (3.20)  with  variance 
(3.21).   This  makes  interpretation  of  procedure  (i)-(iii)  straightforward. 

It  must  be  emphasized  that  if  the  MLE's  are  available  then  a 
Newey-Tauchen-White  regression  (2.13),  which  requires  the  estimated  scores  of 
the  conditional  log-likelihood,  is  available.   The  procedure  obtained  from 
Theorem  2.1  is  more  flexible  albeit  less  efficient. 

A  similar  test  could  be  based  on  competing  specifications  for  E(y  |x  ); 
that  is,  the  zero  as  well  as  positive  observations  for  y  can  be  used.  This 
would  require  specifying  P(y  >0|x  )  in  the  competing  model  (3.22)  such  as  in 
Cragg  [4] . 

Before  leaving  this  example,  it  is  useful  to  note  that  simple  tests  for 
exclusion  restrictions  can  be  developed  along  similar  lines.   The 

unrestricted  mean  function  for  E(y  !y  >0,x  )  is 

wti;t    t 

XtlQol  +  Xt2Qo2  +  ap([xtlQol  +  Xt2ao2]'V-  (3"28) 

The  null  hypothesis  is  that  a  „  =  0,  which  reduces  to  (3.20).   The  indicator 

A 

A   is  now  the  gradient  of  (3.28)  with  respect  to  a   evaluated  at  the 
restricted  estimates: 


A  A 


A   =  x  .  +  c„V   p(x  ,a„,,/a„,)x  . 
t    t2    T  z    t^  Tl   T   t2 

A 

where  V  p(-)    is  the  derivative  of  the  Mill's  ratio.   When  this  X      is  used  in 
z  t 

(i)-(iii)  a  test  asymptotically  equivalent  to  the  LM  statistic  in  the  context 

A  A 

of  WNLS  is  obtained.   Again,  a„,  and  c     are  any  Jl   consistent  estimators  of 

a  ,  and  a    . 
ol      o 
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4 .  Conclusions 

This  paper  has  developed  a  general  class  of  regression-based 
specification  tests  for  (possibly)  dynamic  multivariate  models  which,  in  the 
leading  cases,  imposes  under  H_  only  the  hypotheses  being  tested  (correctness 
of  the  conditional  mean  and/or  correctness  of  the  conditional  variance) .   The 
framework  can  be  applied  to  testing  other  aspects  of  a  conditional 
distribution  under  a  modest  number  of  additional  assumptions.   It  is  hoped 
that  the  computational  simplicity  of  the  methods  proposed  here  removes  some 
of  the  barriers  to  using  robust  test  statistics  in  practice. 

The  possibility  of  generating  simple  test  statistics  when 

1/2  A 
T  '  (6      -    6    )  has  a  complicated  limiting  distribution  should  be  useful  in 

several  situations.   The  tobit  example  in  Section  3  is  only  one  case  where 

the  conditional  mean  parameters  are  estimated  using  a  method  other  than  the 

efficient  WNLS  procedure  or  the  even  more  efficient  MLE.   Another  application 

is  to  choosing  between  log-linear  and  linear-linear  specifications.   In  this 

case,  both  models  can  be  estimated  by  OLS ,  and  then  transformed  in  the  manner 

of  the  tobit  example  to  obtain  estimates  of  E(y  ]x  )  for  the  separate  models. 

t   t 

Theorem  2.1  applies  directly  to  linear  simultaneous  equations  models 

(SEM's).   Computation  of  $  (x  ,6    )  is  straightforward  provided  that  the 

t   t   o 

reduced  form  for  the  endogenous  right  hand  side  variables  is  available.   The 
parameter  vector  8    contains  the  reduced  form  parameters  of  the  relevant 
endogenous  variables  as  well  as  the  structural  parameters  in  the  equation(s) 
of  interest.   If  all  equations  of  a  simultaneous  system  are  being  tested 
jointly  then  6    is  simply  all  structural  parameters.   An  immediate  application 
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is  to  testing  for  serial  correlation  in  linear  dynamic  simultaneous  equations 
models  in  the  presence  of  heteroskedasticity  (conditional  or  unconditional) 
of  unknown  form.   Also,  tests  for  multivariate  ARCH  in  SEM's  that  are  robust 
to  heterokurtosis  are  also  easily  constructed.   The  scope  of  applications  to 
nonlinear  SEM's  is  limited  by  one's  ability  to  compute  $  (x  , 6    )  = 
E[V 04>{y    ,x  ,6    )|x  ].   This  is  exactly  the  problem  of  computing  the  optimal 

u  t    t    O     t 

instrumental  variables  for  nonlinear  SEM's. 

Theorem  2.1  can  be  extended  to  certain  unit  root  time  series  models. 


A  m  ./-.A 


1/2         1/2 
The  initial  purging  of  C   4   from  C   A   can  produce  indicators  A   that  are 

effectively  stationary.   This  happens  for  the  LM  test  in  linear  time  series 

models  when  the  regressors  excluded  under  the  null  hypothesis  are 

individually  cointegrated  (in  a  generalized  sense)  with  the  regressors 

included  under  the  null.   In  this  context  the  statistics  derived  from  Theorem 

2.1  have  the  advantage  over  the  usual  Wald  or  LM  tests  of  being  robust  to 

conditional  heteroskedasticity  under  H    Extending  Theorem  2.1  to  general 

nonstationary  time  series  models  is  left  for  future  research. 
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Mathematical  Appendix 

For  convenience,  I  include  a  lemma  that  is  used  repeatedly  in  the  proof 
of  Theorem  2.1. 


Lemma  A. 1 :   Assume  that  the  sequence  of  random  functions  (Q  (w   0):  0  e  8, 

T— 1,2, ...},  where  QT(w   •)  is  continuous  on  8  and  8  is  a  compact  subset  of 

p 
R  ,  and  the  sequence  of  nonrandom  functions  {Q  (0):  0  G  8,  T=l , 2 , . . . ) , 

satisfy  the  following  conditions: 

(i)  sup  |Q (w 0)  -  Q  (0)]|  B  0; 

0e8 

(ii)   {Q  (0):  0  e   8,  T=l , 2 , . . . }  is  continuous  on  8  uniformly  in  T. 
Let  0   be  a  sequence  of  random  vectors  such  that_0   -0*0  where  (0„! 
C  8.   Then 

QT(wT,0T)  -  QT(0°)  2  0. 
Proof:  see  Wooldridge  [28,  Lemma  A.l,  p. 229]. 

A  definition  simplifies  the  statement  of  the  conditions. 


Definition  A.l:   A  sequence  of  random  functions  (q  (y  ,x  ,0):  0  6  8, 

t— 1,2,...},  where  q^(y  ,x  ,-)  is  continuous  on  8  and  8  is  a  compact  subset  of 

t   t   t 

P 
R  ,  is  said  to  satisfy  the  Uniform  Weak  Law  of  Large  Numbers  (UWLLN)  and 

Uniform  Continuity  (UC)  conditions  provided  that 

T 
(i)  sup   IT"1  Y  q^(y  x  0)  -  E[q  (y  .x  .0)]|  2  0 

0£8        C-l  "   ^   ^ 

and 

1  T 
(ii)   {T  "  Y   Eiq^(>V.^  .*)]:  8   e  6,  T-1,2,...}  is  0(1)  and 
t-1    Z      Z      Z 

continuous  on  8  uniformly  in  T. 


U0 


In  the  statement  of  the  conditions,  the  dependence  of  functions  on  the 

variables  y  and  x   is  frequently  suppressed  for  notational  convenience.   If 

a(9)    is  a  lxL  function  of  the  Pxl  vector  6    then,  by  convention,  V  a(0)  is  the 

8 

LxP  matrix  V  {a(8)'].       If  k(6)    is  a  QxL  matrix  then  the  matrix  V  A(0)  is  the 
P  8 

LQxP  matrix  defined  as 

vgh(ey  -  [Vfaier  I   •••   IVq(°'  ] 

where  A.  (8)    is  the  j th  row  of  A(0)  and  V.A.(B)    is  the  LxP  gradient  of  A. (9) 
J  0J  J 

as  defined  as  above.   Also,  for  any  Lxl  vector  function  <p,  define  the  second 
derivative  of  <p   to  be  the  LPxP  matrix 

^<P<*)  -  Vfl[V^(*)']. 
Finally,  define  the  parameter  vector  S   b  (#'  ,»r')'  . 

Conditions  A. 1 : 

P  N 

(i)  6  c  R  and  II  c  R  are  compact  and  have  nonempty  interiors; 

(ii)  6      e   int(6),  U°:  T-l ,  2 ,  .  .  .  }  c  int(II)  uniformly  in  T; 

(iii)  (a)  [<f>    (y  ,x  ,8):    9   e  6)  is  a  sequence  of  Lxl  functions  such 

that  4>^{-  ,6)    is  Borel  measurable  for  each  0  6  6  and  f  (y  ,x  ,•)  is 

t  t   t   t 

continuously  differentiable  on  the  interior  of  6  for  all  y  ,x  ,  t=l , 2 , . . . ; 

(b)  Define  9Ax.,B    )  =  E,  [V,i  (y  x  . 0  ) |x J  for  all  6      e 

ttO        ppttwOU  o 

o 

int(8) .   Assume  that  <*>  (x  ,.•)  is  continuously  differentiable  on  the 

t   t 

interior  of  6  for  all  x^ ,  t-1,2,...; 

(c)  (Cj_(xj_,c):  S   £  A}  is  a  sequence  cf  LxL  matrices  satisfying 
the  measurabiiity  requirements,  C  (x  ,6)    is  symmetric  and  positive 
semi-definite  for  all  x_  and  5.    and  C  (x  , • )  is  differentiable  on  int(A) 

for  ail  x  ,  t=l , 2 ,  .  .  .  ; 

t 
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(d)   (A  (x  ,8):    8   e  A)  is  a  sequence  of  LxQ  matrices  satisfying 

the  measurability  requirements,  and  A  (x   • )  is  differentiable  on  int(A) 

for   all  x    ,    t-1,2, . . . ; 

(iv)    (a)      T1/2(0„    -    6)    =  0    (1); 
To  p 

(b)      T1/2(ttt    -    tt°)    =  0(1); 
(v)       (a)    ($    (0)'C    (6)*    (0))    and    {$    (0)'C    (5)A    (5)    satistfy   the  UWLLN 

and  UC  conditions; 

-1  T 
(b)  (T  '"  Y,   E[$  '  C  $  ]}  is  uniformly  positive  definite; 

t-1   *~      Z   t 
(vi)  (a)  l$t(0)'Ct(5)V^t(0)},  {[Ip  ®  ^t(0)'Ct(«)]Vfl*t(fl)}I  and 

{*  (9)' [I.    ®  ^  (0)' ]V  C  (5)}  satisfy  the  UWLLN  and  UC  conditions; 

(b)   T"1/2  I  9>°'C%°     -  0  (1); 
^ i  t   trt      pv  " 

(vii)  (a)  {At(5)'Ct(6)V^t(0)},  { [I  ®  ^.(0)'  Ct(S)  JV^*)'  } , 
(At(0'  [IL  ®  *t(«)'  ]V5Ct(6)},  and  (*t(0)'  [^  ®  4>J6)'  ]^Ct(fi)  )  satisfy  the 
UWLLN  and  UC  requirements ; 

i  T 
(viii)   (a)   {E°  -  T"1  I   E[(a!-*°b!)'cV^'c!(aJ-*°b2)]}  is  uniformly 

P-d.; 

(b)  H^1/2!"1/2  I  (A°   -*°tB?)'C°<  ^  N(0,IQ); 

<,= J. 

(c)  (At(5)'Ct(5)^t(0)^t(0)'Ct(5)At(5)}, 

[A_(S)'  C^(8)<f>^(8)4>(6)'  C^(8)§(6))  ,    and  {<J-^(f)'  C_(5)^_(^)^_(£;)'C_(5)*  (0)} 

t-  I-         L.         t  U         t  L.  t         t         t  t         t 

satisfy  the  UWLLN  and  UC  conditions. 
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Proof  of  Theorem  2.1:   First,  note  that  assumptions  (i)-(vi)  ensure  existence 

of  B„  and  imply  that  B_  -  B_  •=  o  (1)  by  Lemma  A.l.   Therefore, 
T  lip 

T 

rT-T-^z  [a  -  v°]'vt  (-D 

t=1      *  T  A  A  A 

-  (L  -  b°)-t"1/2y  J'c  J  . 

A 

Consider  the  term  post-multiplying  (B   -  B  )'  .   A  standard  mean 
value  expansion  about  6  ,  assumption  (vi.a),  and  Lemma  A.l  yield 

T'1/2I  J'C  i     -  rl/2l   $°'CV  (a. 2) 

^,  t  t*t        An  t   trt  v    7 

t=l  t-1 

T 
+  T-2i{^'[lL»*°']VjC°}T1/2(«T-«o)   +   op(l). 

The  first  term  on  the  right  hand  side  of  (a. 2)  is  0  (1)  by  (vi.b).   By  (vi.a) 

and  (iv.a,b),  the  terms  in  lines  two  and  three  of  (a.  2)  are  also  0  (1). 

P 

Therefore , 

1/9   A  A  A 

T  '   1  *'C  6     -  0  (1).  (a. 3) 

t-1  P 

A 

Along  with  B_  -  B_  -  o  (1),  this  establishes  that  under  Hn, 
r    r    p  u 

T 
L   -  T"1/2X  [A   -  *  B°]'C^  +  0(1).  (a. 4) 

t-1  P 

A  mean  value   expansion,    assumption    (vii) ,    and  Lemma  A.l   yield 

-  „-l/2r-    r    o        _c_o,,_c.o  ,      CN 

CT  -  1      '    I    [At    -    *tBT] 'Ct4t  (a. 5) 

+   T_1X    ([a!-^b!]'C°V,c^    -    B°'  ri_   ®   *!'C°]V,$°}    Ti/2(flT    -    B    ) 
.         ttTtet  IP  t     t      6    t  i  o 
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t=l      ^ 


•T1/2(5T  -  6°) 


+   op(l) 


Consider  the  second  line  of  (a. 5).   It  must  be  shown  that  the  average 
appearing  there  is  o  (1)  under  H. .   First,  note  that  by  definition  of  $   and 
the  law  of  iterated  expectations, 

E([A°-*°b£]'C°V^°)  -  E{E([A°-f°B°]'C°V^°|xt]))  (a. 6) 


L  t   t  TJ   t  t 
Therefore , 

I-^E([A°.t°B°,-C°7^°,  -  T-^EUA^XrcV,  -  0  (a. 7) 

by  definition  of  B  .   The  regularity  conditions  imposed  imply  that  each  of 
the  averages  appearing  in  (a. 7)  satisfy  the  WLLN.   Therefore 

T_1  I    [A0-$°B°]'C°V^0   =   o  (1).  (a. 8) 

~  L  t  t  tj  t  eYt  p 

Because  Ef^^Jx  ]    =  0  under  K  ,  it  is  even  easier  to  show  that  the  remaining 
t  t  u 

sample  averages  in  (a. 5)  are  o  (1).   Combined  with  T    (5_  -  5_)  =  0  (1)  this 

p  T    T     p 

establishes  the  first  conclusion  of  the  theorem: 

lT  -  T"1/2 J   [A°  -   *l*°]'Cyt     +  op(l).  (a. 9) 

Given  (viii.a),  the  asymptotic  covariance  matrix  of  £   is  uniformly  positive 
definite.   Moreover,  E_  x   tT  -*     N(0,I  )  under  H-  by  (viii.b).   Condition 
(viii.c)  ensures  that 


T 

I 
t-1 


-•J-  A  AA  AAAAA  AA 

T"      I   [(At      -    *tV'CtVtCt(At      "    W]  (a-10) 


is    a   consistent   estimator   of  S_.       It    is    easy   to    see    that 

CTrT\     -     Tr2,  (a. 11) 

2 
where  R   is  the  uncentered  r-squared  from  the  regression 

1   on  J'tAt    t-1 T,  (a.  12) 

and  (f>      and  A  are  as  defined  in  the  text.   Because  the  dependent  variable  in 

2 
regression  (a. 12)  is  unity,  TR  -  T  -  RSS,  where  RSS  is  the  residual  sum  of 

squares  from  the  regression  (a. 12). 
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